Radix-23 Fast Fourier Transform for an Embedded Digital Signal Processor

ABSTRACT

In some embodiments, a circuit may include an input configured to receive a signal and a radix-2 3  fast Fourier transform (FFT) processing element coupled to the input. The radix-2 3  FFT processing element may be configured to control variation of twiddle factors during calculation of a complete FFT through a plurality of processing stages. The radix-2 3  FFT processing element may be configured to incorporate the twiddle factors and adder tree matrices of the calculation into a single stage.

CROSS-REFERENCE TO RELATED APPLICATION(S)

The present disclosure is a non-provisional of and claims priority toU.S. Provisional Patent Application No. 62/677,610 filed on May 29, 2019and entitled “Radix-2³ Fast Fourier Transform for an Embedded DigitalSignal Processor”, which is incorporated herein by reference in itsentirety.

FIELD

The present disclosure is generally related to devices, systems, andmethods configured to determine a fast Fourier transform (FFT), and moreparticularly to a radix-2³ FFT that can be embedded in a digital signalprocessor (DSP).

BACKGROUND

The Discrete Fourier Transform (DFT) is a mathematical procedure that isused in a wide variety of applications, from image processing to radiocommunications. Further, the DFT can be implemented in computers ordedicated circuitry. Further, the DFT is at the center of the processingthat takes place inside a digital signal processor.

It is known that a DFT can be written as the sum of two discrete Fouriertransforms, each of length N/2. One of the two DFTs can be formed fromthe even-numbered points of the original data of size N, and the otherfrom the odd-numbered points. The Fast Fourier Transform allowed the DFTto be evaluated with a significant reduction in the amount ofcalculation required, allowing the DFT of a sampled signal to beobtained rapidly and efficiently.

SUMMARY

In some embodiments, circuits, devices, systems, and methods describedherein may enhance the efficiency of a DFT operation used to processinput/output data by avoiding trivial multiplication operations. In someembodiments, the circuits, devices, systems and methods may utilize asimple mapping from the three indices (FFT stage, butterfly, andelement) to the addresses of the input/output data with itscorresponding multiplier coefficients.

In some embodiments, a radix-2³ FFT can be used to reduce acomputational load by reducing an amount of the coefficient'smultipliers (Twiddle Factors) utilized to compute an FFT as compared tothe conventional radix-2 FFT. In a particular embodiment, the radix-2³FFT can be configured to reduce the memory accesses, and further, themultiplication by

${\pm \frac{\sqrt{2}}{2}} \pm {j\frac{\sqrt{2}}{2}\mspace{14mu} {can}}$

be also predicted where the number of arithmetical operation requiredfor the complex multiplication can be reduced from 6 to 2, therebyimproving computational performance.

In some embodiments, a circuit may include an input configured toreceive a signal and a radix-2³ fast Fourier transform (FFT) processingelement coupled to the input. The radix-2³ FFT processing element may beconfigured to control variation of twiddle factors during calculation ofa complete FFT through a plurality of processing stages. The radix-2³FFT processing element may be configured to incorporate the twiddlefactors and adder tree matrices of the calculation into a single stage.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 depicts a graph of a Discrete Fourier Transform (DFT)decomposition.

FIG. 2 depicts three stages in the computation of an 8-point Decimationin Time (DIT) DFT.

FIG. 3 depicts a graph of a basic butterfly computation for the DIT FFTalgorithm.

FIG. 4 depicts a signal flow graph of an 8-point DIT FFT.

FIG. 5 depicts three stages of an 8-point DIF FFT algorithm.

FIG. 6 depicts a butterfly computation for a decimation in frequency(DIF) FFT algorithm.

FIG. 7 depicts stages of an 8-point DIF FFT algorithm.

FIG. 8 depicts a radix-8 DIT butterfly, in accordance with certainembodiments of the present disclosure.

FIG. 9 depicts a signal flow graph of an 8-point DIT FFT, in accordancewith certain embodiments of the present disclosure.

FIG. 10 depicts a graph of the 8^(th) root of unity, in accordance withcertain embodiments of the present disclosure.

FIG. 11 depicts a graph of a Radix-2³ FFT butterfly structure for atrivial computation, in accordance with certain embodiments of thepresent disclosure.

FIG. 12 depicts a graph of a Radix-2³ FFT butterfly structure for anon-trivial computation, in accordance with certain embodiments of thepresent disclosure.

FIG. 13 depicts a graph of a percentage reduction of clock cycles as afunction of the FFT length for a timing clock and a reference clock, inaccordance with certain embodiments of the present disclosure.

FIG. 14 depicts a block diagram of a signal processing system includinga Radix-2³ FFT butterfly structure, in accordance with certainembodiments of the present disclosure.

In the following discussion, the same reference numbers are used in thevarious embodiments to indicate the same or similar elements.

DETAILED DESCRIPTION OF ILLUSTRATIVE EMBODIMENTS

circuits, devices, systems, and methods described herein may enhance theefficiency of a DFT operation used to process input/output data byavoiding trivial multiplication operations. In some embodiments, thecircuits, devices, systems and methods may utilize a simple mapping fromthe three indices (FFT stage, butterfly, and element) to the addressesof the input/output data with its corresponding multiplier coefficients.

In some embodiments, a radix-2³ FFT can be used to reduce acomputational load by reducing an amount of the coefficient'smultipliers (Twiddle Factors) utilized to compute an FFT as compared tothe conventional radix-2 FFT. In a particular embodiment, the radix-2³FFT can be configured to reduce the memory accesses, and further, themultiplication by

${\pm \frac{\sqrt{2}}{2}} \pm {j\frac{\sqrt{2}}{2}\mspace{14mu} {can}}$

be also predicted where the number of arithmetical operation requiredfor the complex multiplication can be reduced from 6 to 2, therebyimproving computational performance.

In some embodiments, a circuit may include an input configured toreceive a signal and a radix-2³ fast Fourier transform (FFT) processingelement coupled to the input. The radix-2³ FFT processing element may beconfigured to control variation of twiddle factors during calculation ofa complete FFT through a plurality of processing stages. The radix-2³FFT processing element may be configured to incorporate the twiddlefactors and adder tree matrices of the calculation into a single stage.

FIG. 1 depicts a graph 100 of a Discrete Fourier Transform (DFT)decomposition. The definition of the DFT is represented by the followingequation

X _([k])=Σ_(n=0) ^(N−1) x _([n]) w _(N) ^(nk) , k∈[0,N−1],   (Equation1)

where x[n] is the input sequence, X[k] is the output sequence, N is thetransform length,

$w_{N}^{nk} = e^{{- {j{(\frac{2\pi}{N})}}}{nk}}$

is called the twiddle factor in butterfly structure, and j2=−1. Bothx[n] and X[k] are complex number sequences.

The graph 100 depicts a sixteen-bit input sequence at 102, which can bedecomposed into two signals of eight bits each as shown at 104. Itshould be understood that a decimation-in-time (DIT) FFT algorithm(sometimes called a “Cooley-Tukey FFT algorithm”) first rearranges theinput elements into bit-reverse order, and then builds up the outputtransform in log₂N iterations. In the DIT process, the input data issubdivided into two sets of even-numbered and odd numbered data, asshown by the first decomposition 104 in the graph 100. The two signalsof eight bits can be further decomposed into four signals of four bitseach, as shown at 106. The four signals of four bits each can bedecomposed into eight signals of two bits each, at 108. The eightsignals can be further decomposed into sixteen signals of one bit each,at 110.

If N/2 is even, as it is when N is equal to power of 2, then the DFTs ofeach of the N/2 points can be computed by breaking each of the sums intotwo N/4 points DFTs, which can be combined to yield the N/2 points DFTs.In the example of FIG. 1, an N point signal can be decomposed into Nsignals, each of which includes a single point. In some embodiments,each stage may use an interlace decomposition, separating the even andodd numbered samples. If the system is configured to decompose the foursignals into eight signal point transforms, the system may decompose Ninto N/4 and N/4 into N/8 points transforms. The system may continueuntil left with only 2 points transforms, this requires m stages wherem=log₂N, as shown in FIG. 2.

FIG. 2 depicts a system 200 including three stages 202, 204, and 206 inthe computation of an 8-point Decimation in Time (DIT) DFT. At a firststage 202, a two-point DFT receives two inputs and provides two outputs.At a second stage 204, the block combines four inputs from the firststage 202 and provides four outputs. At a third stage 206, the blockcombines four-point DFTs to produce an eight-point DIT DFT.

FIG. 3 depicts a graph 300 of a basic butterfly computation for the DITFFT algorithm. The graph 300 may include a summing node 302 including afirst input coupled to a node 304, a second input coupled to a node 306,and an output coupled to a node 308. The graph 300 may include a summingnode 310 including a first input coupled to a node 304, a second inputcoupled to a node 306, and an output coupled to a node 312. The graph300 further includes a butterfly operation 314 coupled to the inputs 308and 312. Other embodiments are also possible.

It is also possible to derive FFT algorithms that first go through a setof log2 N iterations on the input data and rearrange the output valuesinto bit-reverse order. This type of FFT algorithm is sometimes referredto as a decimation-in-frequency (DIF) or Sande-Tukey FFT algorithm. Anexample of an 8-point DIT FFT is described below with respect to FIG. 4.

FIG. 4 depicts a signal flow graph 400 of an 8-point DIT FFT. The outputsequences X_((k)) are decimated (split) into the even-numbered samplesand odd-numbered samples. Then, the DIF is obtained by performing thebutterfly computation (in place computation or post multiplicationtechnique).

Briefly, the basic operation of a radix-r butterfly includes combining rinputs to provide r outputs via the following operation:

X=B_(r)x,   (Equation 2)

where x=[x₍₀₎, x₍₁₎, . . . , x_((r−1))]^(T) is the input vector,X=[X₍₀₎, X₍₁₎, . . . , X_((r−1))]^(T) is the output vector, and Tdenotes the transpose of the vector.

The value B_(r) is the r×r butterfly matrix, which can be expressed asfollows:

B_(r)=W_(N)T_(r),   (Equation 3)

for the decimation in frequency (DIF) process. The value B_(r) of ther×r butterfly matrix for the decimation in time (DIT) process can beexpressed as follows:

B_(r)=T_(r)W_(N)   (Equation 4)

where, for both cases, the value W_(N) is defined as follows:

$\begin{matrix}{{W_{N} = {{diag}\left( {w_{N}^{0},w_{N}^{p},w_{N}^{2p},\ldots \mspace{14mu},w_{N}^{{({r - 1})}p}} \right)}},{and}} & \left( {{Equation}\mspace{14mu} 5} \right) \\{T_{t} = {\begin{bmatrix}w_{N}^{0} & w_{N}^{0} & w_{N}^{0} & \ldots & \ldots & w_{N}^{0} \\w_{N}^{0} & w_{N}^{N/r} & w_{N}^{2{N/r}} & \ldots & \ldots & w_{N}^{{({r - 1})}{N/r}} \\w_{N}^{0} & w_{N}^{2{N/r}} & w_{N}^{4{N/r}} & \ldots & \ldots & w_{N}^{2{({r - 1})}{N/r}} \\\vdots & \vdots & \vdots & \vdots & \vdots & \vdots \\\vdots & \vdots & \vdots & \vdots & \vdots & \vdots \\w_{N}^{0} & w_{N}^{{({r - 1})}{N/r}} & w_{N}^{2{({r - 1})}{N/r}} & \ldots & \ldots & w_{N}^{{({r - 1})}^{2}{N/r}}\end{bmatrix}.}} & \left( {{Equation}\mspace{14mu} 6} \right)\end{matrix}$

The signal flow graph 400 may include a first stage 402, a second stage404, and a third stage 406, which may be configured to receive eightinputs and to generate an eight-point DIF FFT output.

FIG. 5 depicts three stages of an 8-point DIF FFT algorithm 500. Thealgorithm 500 may include a first stage 502, a second stage 504, and athird stage 506. The first stage 502 may receive eight inputs and mayproduce eight inputs for the second stage 504, which produces eightoutputs. The third stage 506 may receive the eight outputs of the secondstage 504 and may produce the DIF FFT output.

FIG. 6 depicts a butterfly computation 600 for a decimation in frequency(DIF) FFT algorithm. The computation 600 may include a summing node 602including a first input coupled to a node 604, a second input coupled toa node 606, and an output coupled to a node 608. The computation 600 mayfurther include a summing node 610 including a first input coupled tothe node 604, a second input coupled to the node 606, and an outputcoupled to a node 612. The computation 600 may further include amultiplication stage 614.

FIG. 7 depicts stages of an 8-point DIF FFT algorithm 700. The algorithm700 may include a first stage 702, a second stage 704, and a third stage706 that may cooperate to sort the output data in normal order toprovide an output in bit-reversed order.

One of the bottlenecks in most applications, where high performance isrequired, is the FFT/IFFT processor. Given that higher radiximplementations are attractive for reduction in computations,researchers have sought a higher radix butterfly implementation, becausethe higher radix will reduce automatically the communication load.However, the higher radix has typically added to the computational load.While attempts have been made to reduce the computational load byfactoring the adder matrix (or by simplification of adder tree),conventional attempts have not provided a complete solution for the FFTproblem due to the increasing complexity of the butterflies for higherradices introduced by the added multipliers in the butterfly's criticalpath, as depicted in FIG. 8.

FIG. 8 depicts a radix-8 DIT butterfly 800, in accordance with certainembodiments of the present disclosure. In this example, the radix-8 DITbutterfly 800 may include a plurality of multiplier nodes 802, which areeach coupled to one of a plurality of inputs 804. The butterfly 800 mayfurther include a plurality of summing nodes 806, 810, and 814, andadditional multiplier nodes 808 and 812. In this example, the multipliernode 808B and the multiplier node 812A may be in a critical path and mayrepresent additional multipliers that may not be present in lower valuedradices and thus add to the computational load. In FIG. 8, the dashedline may represent a butterfly critical path.

It should be appreciated that the elements of the adder tree matrixT_(r) and the elements of the twiddle factor matrix both contain twiddlefactors. By controlling the variation of the twiddle factors during thecalculation of a complete FFT, the twiddle factors and the adder treematrices can be incorporated in a single stage of calculation.

Therefore, by defining [T_(r)]_(l,m) as the element at the l^(th) lineand m^(th) column in the matrix T_(r) as a result, Equation 6 can berewritten as follows:

$\begin{matrix}{{\left\lbrack T_{r} \right\rbrack_{l,m} = w_{N}^{{〚{({l\; m\frac{N}{r}})}〛}_{N}}},} & \left( {{Equation}\mspace{14mu} 7} \right)\end{matrix}$

where l=0, 1, . . . , r−1, m=0, 1, . . . , r−1 and

x

_(N) represents the operation x modulo N. Further, by definingW_(N(m,v,s)), the set of the twiddle factor matrix can be determined asfollows:

[W _(N)]_(l,m(v,s))=diag(w _(N(0,v,s)) ,w _(N(1,v,s)) , . . . , w_(N(r−1,v,s))),   (Equation 8)

where the indices r is the FFT's radix, v=0, 1, . . ., V−1 representsthe number of words of size r

$\left( {V = \frac{N}{r}} \right),$

and s=0, 1, . . . , S is the number of stages (or iterations S=log_(r)N−1).

Finally, Equation 8 could be expressed for the different stages in anFFT process as follows:

$\begin{matrix}{\left\lbrack W_{N} \right\rbrack_{l,{m{({v,s})}}} = \left\{ {\begin{matrix}w_{N}^{{〚{{\lfloor\frac{v}{r^{s}}\rfloor}l\; r^{s}}〛}_{N}} & {{{for}\mspace{14mu} l} = m} \\0 & {elsewhere}\end{matrix},} \right.} & \left( {{Equation}\mspace{14mu} 9} \right)\end{matrix}$

for the DIF process. For the DIT process, Equation 8 can be expressed asfollows:

$\begin{matrix}{\left\lbrack W_{N} \right\rbrack_{l,{m{({v,s})}}} = \left\{ {\begin{matrix}w_{N}^{{〚{{\lfloor\frac{v}{r^{({S - s})}}\rfloor}l\; r^{({S - s})}}〛}_{N}} & {{{for}\mspace{14mu} l} = m} \\0 & {elsewhere}\end{matrix},} \right.} & (10)\end{matrix}$

for the DIT Process, where l=0, 1, . . . r−1 is the l^(th) butterfly'soutput, m=0, 1, . . . , r−1 is the m^(th) butterfly's input, and └x┘represents the integer part operator of x.

Consequently, the l^(th) transform output during each stage could beillustrated as follows:

$\begin{matrix}{{{X_{({v,s})}\lbrack l\rbrack} = {\sum\limits_{m = 0}^{r - 1}{{x_{({v,s})}\lbrack m\rbrack}w_{N}^{{〚{{l\; m\frac{N}{r}} + {{\lfloor{v/r^{s}}\rfloor}l\; r^{s}}}〛}_{N}}}}},} & \left( {{Equation}\mspace{14mu} 11} \right)\end{matrix}$

for the DIF process, and could be expressed as follows for the DITprocess:

$\begin{matrix}{{X_{({v,s})}\lbrack l\rbrack} = {\sum\limits_{m = 0}^{r - 1}{{x_{({v,s})}\lbrack m\rbrack}{w_{N}^{{〚{{l\; m\frac{N}{r}} + {{\lfloor{v/r^{({S - s})}}\rfloor}m\; r^{({S - s})}}}〛}_{N}}.}}}} & \left( {{Equation}\mspace{14mu} 12} \right)\end{matrix}$

The read address generator (RAG), write address generator (WAG), andcoefficient address generator (CAG) can be written for DIF and DITprocesses, respectively. The mth butterfly's input of vth word x(m) atthe sth stage (sth iteration) can be determined as follows:

$\begin{matrix}{{RAG}_{({m,v,0})} = {{m \times \frac{M}{r}} + {v.}}} & \left( {{Equation}\mspace{14mu} 13} \right.\end{matrix}$

For s>0, the read address generator can determine the read address asfollows:

$\begin{matrix}{{RAG}_{({m,v,s})} = {{m \times \frac{N}{r^{2}}} + {〚{\left\lfloor \frac{v}{r^{({s - 1})}} \right\rfloor \times \frac{N}{r}}〛}_{N} + {〚k〛}_{r^{({s - 1})}} + {\left\lfloor \frac{v}{r^{s}} \right\rfloor \times r^{({s - 1})}}}} & \left( {{Equation}\mspace{14mu} 14} \right)\end{matrix}$

for the DIF process, and for the DIT process, the read address generatorcan be determined as follows:

$\begin{matrix}{{{RAG}_{({m,v,s})} = {{m \times \left( \frac{N}{r^{({s + 1})}} \right)} + {〚v〛}_{r^{({S - s})}} + {\left\lfloor \frac{v}{r^{({S - s})}} \right\rfloor \times r^{({S + 1 - s})}}}},} & \left( {{Equation}\mspace{14mu} 15} \right)\end{matrix}$

for the DIT process where m=0, 1, . . . , r−1, v=0, 1, . . . , V−1 ands=0, 1, . . . , S, S=log_(r) N−1 in which

x

_(N) represents the operation x modulo N and └x┘ represents the integerpart operator of x.

For both cases, the l^(th) processed butterfly's output X_((l,v,s)) forthe v^(th) word at the s^(th) stage can be stored into the memoryaddress location can be determined according to the following equation:

WAG _((l,v,s)) =l(N/r)+v.   (Equation 16)

In this example, the input data and the output data are in natural orderduring each stage of the FFT process according to an Ordered InputOrdered Output (OIOO) algorithm.

The coefficients multipliers (Twiddle Factors) can be determined duringeach stage. The coefficient address generator values can be fed to them^(th) butterfly's input of v^(th) word x_((m)) at the s^(th) stage(s^(th) iteration), and can be determined according to the followingequation:

$\begin{matrix}{{{CAG}_{({m,v,s})} = {〚{l \times \left( {{m\; V} + {\left\lfloor \frac{v}{r^{s}} \right\rfloor r^{s}}} \right)}〛}_{N}},} & \left( {{Equation}\mspace{14mu} 17} \right)\end{matrix}$

for the DIF process, and according to the following equation for the DITprocess:

$\begin{matrix}{{CAG}_{({m,v,s})} = {{〚{m \times \left( {{lV} + {\left\lfloor \frac{v}{r^{({S - s})}} \right\rfloor \times r^{({S - s})}}} \right)}〛}_{N}.}} & \left( {{Equation}\mspace{14mu} 18} \right)\end{matrix}$

By examining Equations 16 and 17, it can be observed that the data aregrouped with their corresponding coefficients multipliers during eachstage due to the fact that the mth coefficient multiplier of the lthbutterfly's output shift, if and only if, v(v=0, 1, . . . , V−1) will beequal to r(S−s) in the DIF process or v=rs in the DIT process. As aresult and since V=N/r=rS; the total number of shifts during each stagein the DIT process would be rs, and the total number of shifts duringeach stage in the DIF process is r(S−s). Therefore, by implementing aword counter r(S−s) (wordcounter=0, 1, . . . , r(S−s)−1) and a shiftingcounter rs (shiftcounter=0, 1, . . ., rs−1) in the DIT process (or aword counter rs and a shifting counter r(S−s) in the DIF process), it ispossible to obtain high efficiency DIT/DIF radix-r algorithms in whichthe access to the coefficient multiplier's memory is reduced compared toconventional radix-r DIT/DIF algorithms.

In addition, the occurrence of the multiplication by one (i.e. theelements of the twiddle factor matrix illustrated in Equation 8 are allequal to one) can be easily predicted when the shifting counter in bothcases is equal to zero (i.e. v<rs or v<r(S−s)). By predicting when theshifting counter is equal to zero, the trivial multiplication by one(w0) during the entire FFT process can be avoided.

With the same reasoning as above, the complexity of the DIT/DIF readinggenerators can be obtained and replaced with simple counters. Furtherreductions in computation and further reductions in the coefficientmultiplier's memory access can also be realized. For simplicity and inorder to reduce the complexity of the equations that will follow, theterms can be defined as follows:

$\begin{matrix}{\alpha = {r^{({S - s})} = {{2^{({S - s})}\alpha_{\lambda}} = \left\{ {{\begin{matrix}\alpha & {{{for}\mspace{14mu} \lambda} = 0} \\{\lambda\alpha} & {{{for}\mspace{14mu} \lambda} \geq 1}\end{matrix}\chi} = {{\alpha \mspace{14mu} \chi_{\lambda}} = \left\{ {{\begin{matrix}0 & {{{for}\mspace{14mu} \lambda} = 0} \\{\lambda\alpha} & {{{for}\mspace{14mu} \lambda} \geq 1}\end{matrix}\beta} = {{r \times r^{({S - s})}} = {{2 \times 2^{({S - s})}\mspace{14mu} \beta_{\lambda}} = {{\lambda\beta}.}}}} \right.}} \right.}}} & \left( {{Equation}\mspace{14mu} 19} \right)\end{matrix}$

For the radix 2 case, Equation 12 at the s^(th) stage can be rewrittenas follows:

$\begin{matrix}{{\begin{bmatrix}X_{({k + \chi_{\lambda}})} \\X_{({k + \chi_{\lambda} + V})}\end{bmatrix} = \begin{bmatrix}{x_{({n + \beta_{\lambda}})} + {x_{({n + \beta_{\lambda} + \alpha_{\lambda}})}w_{N}^{{〚{{\lfloor{v/2^{({S - s})}}\rfloor}2^{({S - s})}}〛}_{N}}}} \\{x_{({n + \beta_{\lambda}})} + {x^{({n + \beta_{\lambda} + \alpha_{\lambda}})}w_{N}^{{〚{\frac{N}{2} + {{\lfloor{v/2^{({S - s})}}\rfloor}2^{({S - s})}}}〛}_{N}}}}\end{bmatrix}},} & \left( {{Equation}\mspace{14mu} 20} \right)\end{matrix}$

that could be simplified as follows:

$\begin{matrix}{{\begin{bmatrix}X_{({k + \chi_{\lambda}})} \\X_{({k + \chi_{\lambda} + V})}\end{bmatrix} = \begin{bmatrix}{x_{({n + \beta_{\lambda}})} + {x_{({n + \beta_{\lambda} + \alpha_{\lambda}})}w_{N}^{{〚{{\lfloor{v/2^{({S - s})}}\rfloor}2^{({S - s})}}〛}_{N}}}} \\{x_{({n + \beta_{\lambda}})} - {x_{({n + \beta_{\lambda} + \alpha_{\lambda}})}w_{N}^{{〚{{\lfloor{v/2^{({S - s})}}\rfloor}2^{({S - s})}}〛}_{N}}}}\end{bmatrix}},} & \left( {{Equation}\mspace{14mu} 21} \right)\end{matrix}$

where x denotes the input from the previous stage and X represents thetransform output.

By replacing the term └v/2^((S−s))┘ with the term λ which is the valueof the shifting counter that cannot exceed 2^(s)−1, Equation 21 may bewritten to have the final form as follows:

$\begin{matrix}{\begin{bmatrix}X_{({k + \chi_{\lambda}})} \\X_{({k + \chi_{\lambda} + V})}\end{bmatrix} = {\begin{bmatrix}{x_{({n + \beta_{\lambda}})} + {x_{({n + \beta_{\lambda} + \alpha_{\lambda}})}w_{N}^{\alpha_{\lambda}}}} \\{x_{({n + \beta_{\lambda}})} - {x_{({n + \beta_{\lambda} + \alpha_{\lambda}})}w_{N}^{\alpha_{\lambda}}}}\end{bmatrix}.}} & \left( {{Equation}\mspace{14mu} 22} \right)\end{matrix}$

For the first iteration (s=0), the maximum value that v can attain isV−1. As a result, the term └v/V┘=λ is always zero; therefore, for thefirst iteration, Equation 22 can be written as follows:

$\begin{matrix}{{\begin{bmatrix}X_{(k)} \\X_{({k + V})}\end{bmatrix} = \begin{bmatrix}{x_{(n)} + x_{({n + \alpha})}} \\{x_{(n)} - x_{({n + \alpha})}}\end{bmatrix}},} & \left( {{Equation}\mspace{14mu} 23} \right)\end{matrix}$

During the second iteration (s=1), the term λ is either zero or one as aresult Equation 22 and can be expressed as follows:

$\begin{matrix}{{\begin{bmatrix}X_{(k)} \\X_{({k + V})} \\X_{({k + \alpha})} \\X_{({k + \alpha + V})}\end{bmatrix} = \begin{bmatrix}{x_{(n)} + x_{({n + \alpha})}} \\{x_{(n)} - x_{({n + \alpha})}} \\{x_{({n + \beta})} + {x_{({n + \beta + \alpha})}w_{N}^{\alpha}}} \\{x_{({n + \beta})} - {x_{({n + \beta + \alpha})}w_{N}^{\alpha}}}\end{bmatrix}},} & \left( {{Equation}\mspace{14mu} 24} \right)\end{matrix}$

which could be simplified as follows:

$\begin{matrix}{{\begin{bmatrix}X_{(k)} \\X_{({k + V})} \\X_{({k + \alpha})} \\X_{({k + \alpha + V})}\end{bmatrix} = \begin{bmatrix}{x_{(n)} + x_{({n + \alpha})}} \\{x_{(n)} - x_{({n + \alpha})}} \\{x_{({n + \beta})} + {\left( {- j} \right)x_{({n + \beta + \alpha})}}} \\{x_{({n + \beta})} - {\left( {- j} \right)x_{({n + \beta + \alpha})}}}\end{bmatrix}},} & \left( {{Equation}\mspace{14mu} 25} \right)\end{matrix}$

Finally, for the third iteration (s=2), the term λ could have thefollowing values 0, 1, 2 and 3, and, as a result, Equation 22 can beillustrated as follows:

$\begin{matrix}{{\begin{bmatrix}X_{(k)} \\X_{({k + V})} \\X_{({k + \alpha})} \\X_{({k + \alpha + V})} \\X_{({k + {2\alpha}})} \\X_{({k + {2\alpha} + V})} \\X_{({k + {3\alpha}})} \\X_{({k + {3\alpha} + V})}\end{bmatrix} = \begin{bmatrix}{x_{(n)} + x_{({n + \alpha})}} \\{x_{(n)} - x_{({n + \alpha})}} \\{x_{({n + \beta})} + {x_{({n + \beta + \alpha})}w_{N}^{\alpha}}} \\{x_{({n + \beta})} - {x_{({n + \beta + \alpha})}w_{N}^{\alpha}}} \\{x_{({n + {2\beta}})} + {x_{({n + {2\beta} + {2\alpha}})}w_{N}^{2\alpha}}} \\{x_{({n + {2\beta}})} - {x_{({n + {2\beta} + {2\alpha}})}w_{N}^{2\alpha}}} \\{x_{({n + {3\beta}})} + {x_{({n + {3\beta} + {3\alpha}})}w_{N}^{3\alpha}}} \\{x_{({n + {3\beta}})} - {x_{({n + {3\beta} + {3\alpha}})}w_{N}^{3\alpha}}}\end{bmatrix}},} & \left( {{Equation}\mspace{14mu} 26} \right)\end{matrix}$

The matrices of Equation 26 may be simplified as follows:

$\begin{matrix}{{\begin{bmatrix}X_{(k)} \\X_{({k + V})} \\X_{({k + \alpha})} \\X_{({k + \alpha + V})} \\X_{({k + {2\alpha}})} \\X_{({k + {2\alpha} + V})} \\X_{({k + {3\alpha}})} \\X_{({k + {3\alpha} + V})}\end{bmatrix} = \begin{bmatrix}{x_{(n)} + x_{({n + \alpha})}} \\{x_{(n)} - x_{({n + \alpha})}} \\{x_{({n + \beta})} + {\left( {\frac{\sqrt{2}}{2}\left( {1 - j} \right)} \right) \times x_{({n + \beta + \alpha})}}} \\{x_{({n + \beta})} - {\left( {\frac{\sqrt{2}}{2}\left( {1 - j} \right)} \right) \times x_{({n + \beta + \alpha})}}} \\{x_{({n + {2\beta}})} + {\left( {- j} \right) \times x_{({n + {2\beta} + {2\alpha}})}}} \\{x_{({n + {2\beta}})} - {\left( {- j} \right) \times x_{({n + {2\beta} + {2\alpha}})}}} \\{x_{({n + {3\beta}})} + {\left( {\frac{- \sqrt{2}}{2}\left( {1 + j} \right)} \right) \times x_{({n + {3\beta} + {3\alpha}})}}} \\{x_{({n + {3\beta}})} - {\left( {\frac{- \sqrt{2}}{2}\left( {1 + j} \right)} \right) \times x_{({n + {3\beta} + {3\alpha}})}}}\end{bmatrix}},} & \left( {{Equation}\mspace{14mu} 27} \right)\end{matrix}$

and the signal flow graph of an 8 point DIT FFT according to Equation 27is illustrated in FIG. 9.

FIG. 9 depicts a signal flow graph 900 of an 8-point DIT FFT, inaccordance with certain embodiments of the present disclosure. The graph900 may include a plurality of summing nodes, generally indicated at902. Further, the graph 900 can include reordering operations, generallyindicated at 904. The graph 900 depicts a plurality of summing nodes,generally indicated at 906, and two multiplier nodes 907A and 907B.Further, the graph 900 may include a plurality of reordering operations,generally indicated at 908. Additionally, the graph 900 can includemultipliers 909A, 909B, and 909C and a plurality of summing nodes,generally indicated at 910.

The multiplication by −j at 907A and 907B in FIG. 9 can be easilyincorporated in the additions by switching the real and imaginary partsof the data, and the multiplication of the input data by

${\pm \frac{\sqrt{2}}{2}} \pm {j\frac{\sqrt{2}}{2}}$

may cost 2 real multiplications. As a result, the total cost of realmultiplication of the proposed structure can include 4 realmultiplication operations, as compared to the structure of FIG. 4 thatwould cost 20 real multiplication operations (i.e., 5 complexmultiplications).

FIG. 10 depicts a graph 800 of the 8^(th) root of unity, in accordancewith certain embodiments of the present disclosure. The graph 800depicts complex numbers including imaginary (I) and real (R) components.In some embodiments, the complex numbers may result in a value of onewhen raised to some positive integer power n.

From Equations 23, 25, and 27, the first, second, and the thirditerations of the DIT FFT process may include only trivialmultiplication operations. In order to predict the occurrence of thetrivial multiplication in the rest of the iterations (i.e. s≥3), whichis a multiple of w8 as shown in FIG. 10, the following discussionintroduces the term 2(s−2) (hereinafter referred to as a “separator”)that will subdivide 2 s into 4 sub regions. The choice of theseparator's value will be based on the following equations. For Lemma 1,for all stages of the OIOOO FFT algorithm, the product of 2(s−2) and2(S−s) is always =N/8∀s. This identity can be proven according to thefollowing equations:

$\begin{matrix}{{s^{({S - s})} \times 2^{({s - 2})}} = {2^{({S - 2})} = {\frac{N}{2^{3}} = {\frac{N}{8}.}}}} & \left( {{Equation}\mspace{14mu} 28} \right)\end{matrix}$

For different values of λ, Equation 22 provides the following values:

  i.  λ = 0  ii.  λ₀ ∈ 1  ⋯  2^((s − 2))[?  iii.  λ = 2^((s − 2))  iv.  λ₁ ∈ 2^((s − 2)) + 1  ⋯  2 × 2^((s − 2))[??indicates text missing or illegible when filed

For the i^(th) case at the s^(th) iteration (stage), Equation 22 can beexpressed as follows:

$\begin{matrix}{{\begin{bmatrix}{X(k)} \\{X\left( {k + V} \right)}\end{bmatrix} = \begin{bmatrix}{{x(n)} + {x\left( {n + \alpha} \right)}} \\{{x(n)} - {x\left( {n + \alpha} \right)}}\end{bmatrix}},} & \left( {{Equation}\mspace{14mu} 29} \right)\end{matrix}$

For the iii^(th) case, Equation 22 can be expressed as follows:

$\begin{matrix}{\begin{bmatrix}{X\left( {k + 2^{({S - 2})}} \right)} \\{X\left( {k + 2^{({S - 2})} + V} \right)}\end{bmatrix} = {\quad{\begin{bmatrix}{{x\left( {n + 2^{({S - 1})}} \right)} + {{x\left( {n + 2^{({S - 1})} + 2^{({S - 2})}} \right)}\tau}} \\{{x\left( {n + 2^{({S - 1})}} \right)} - {{x\left( {n + 2^{({S - 1})} + 2^{({S - 2})}} \right)}\tau}}\end{bmatrix},\mspace{20mu} {{{where}\mspace{14mu} \tau} = {\frac{\sqrt{2}}{2}{\left( {1 - j} \right).}}}}}} & \left( {{Equation}\mspace{14mu} 30} \right)\end{matrix}$

For v^(th) and vii^(th) cases, Equation 22 can be expressed,respectively, as follows:

$\begin{matrix}{\mspace{79mu} {\begin{bmatrix}X_{({k + 2^{({S - 1})}})} \\X_{({k + 2^{({S - 1})} + V})}\end{bmatrix} = {\begin{bmatrix}{x_{({n + 2^{S}})} + {x_{({n + 2^{S} + 2^{({S - 1})}})}\left( {- j} \right)}} \\{x_{({n + 2^{S}})} - {x_{({n + 2^{S} + 2^{({S - 1})}})}\left( {- j} \right)}}\end{bmatrix}.}}} & \left( {{Equation}\mspace{14mu} 31} \right) \\{\begin{bmatrix}X_{({k + {3 \times 2^{({S - 2})}}})} \\X_{({k + {3 \times 2^{({S - 2})}} + V})}\end{bmatrix} = {\quad{\begin{bmatrix}{x_{({n + {3 \times 2^{({S - 1})}}})} + {x_{({n + {3 \times 2^{({S - 1})}} + {3 \times 2^{({S - 2})}}})}\sigma}} \\{x_{({n + {3 \times 2^{({S - 1})}}})} - {x_{({n + {3 \times 2^{({S - 1})}} + {3 \times 2^{({S - 2})}}})}\sigma}}\end{bmatrix},\mspace{20mu} {{{where}\mspace{14mu} \sigma} = {\frac{- \sqrt{2}}{2}{\left( {1 + j} \right).}}}}}} & \left( {{Equation}\mspace{14mu} 32} \right)\end{matrix}$

Therefore, for s≥3, there are four sets of size r^((S−s)) words thathave

${\frac{\pm \sqrt{2}}{2}\left( {1 \pm j} \right)},$

1, and −j as trivial multiplications that can be grouped. Grouping the“trivial” multiplications can yield the following expression:

$\begin{matrix}{\begin{bmatrix}X_{(k)} \\X_{({k + V})} \\X_{({k + 2^{({S - 2})}})} \\X_{({k + 2^{{({S - 2})} + V}})} \\X_{({k + {2 \times 2^{({S - 2})}}})} \\X_{({k + {2 \times 2^{({S - 2})}} + V})} \\X_{({k + {3 \times 2^{({S - 2})}}})} \\X_{({k + {3 \times 2^{({S - 2})}} + V})}\end{bmatrix} = {\quad{\begin{bmatrix}{x_{(n)} + x_{({n + \alpha})}} \\{x_{(n)} - x_{({n + \alpha})}} \\{x_{({n + 2^{({S - 1})}})} + {x_{({n + 2^{({S - 1})} + 2^{({S - 2})}})}\tau}} \\{x_{({n + 2^{({S - 1})}})} - {x_{({n + 2^{({S - 1})} + 2^{({S - 2})}})}\tau}} \\{x_{({n + 2^{S}})} + {x_{({n + 2^{S} + {2 \times 2^{({S - 2})}}})}\left( {- j} \right)}} \\{x_{({n + 2^{S}})} - {x_{({n + 2^{S} + {2 \times 2^{({S - 2})}}})}\left( {- j} \right)}} \\{x_{({n + {3 \times 2^{({S - 1})}}})} + {x_{({n + {3 \times 2^{({S - 1})}} + {3 \times 2^{({S - 2})}}})}\sigma}} \\{x_{({n + {3 \times 2^{({S - 1})}}})} + {x_{({n + {3 \times 2^{({S - 1})}} - {3 \times 2^{({S - 2})}}})}\sigma}}\end{bmatrix},}}} & \left( {{Equation}\mspace{14mu} 33} \right)\end{matrix}$

and the resulting structure for this particular case is depicted in FIG.11.

FIG. 11 depicts a graph 1100 of a Radix-2³ FFT butterfly structure for atrivial computation, in accordance with certain embodiments of thepresent disclosure. The graph 1100 may include summing nodes, generallyindicated at 1103. The graph 1100 may include a complex multiplier node1103 and can include summing nodes, generally indicated at 1104. Thegraph 1100 may further include a trivial multiplier 1105 and can includesumming nodes, generally indicated at 1106. The graph 1100 can furtherinclude a complex multiplier 1107 and can include summing nodes 1108,generally indicated at 1108.

For the other cases and by comparing the domains of λ, each domain of λcan be represented as follows:

$\begin{matrix}{\begin{matrix}{\lambda \in {{\xi \; r^{({s - 2})}} + 1}} & {{\xi \; r^{({s - 2})}} + 2} & \cdots & {{\xi \; r^{({s - 2})}} + r^{({s - 2})}}\end{matrix}\left\lbrack {\text{?},{\text{?}\text{indicates text missing or illegible when filed}}} \right.} & \left( {{Equation}\mspace{14mu} 34} \right)\end{matrix}$

where ξ=0, 1, 2 and 3. Other cases can be expressed as follows:

$\begin{matrix}{\begin{bmatrix}X_{({k + \alpha_{\lambda_{\xi}}})} \\X_{({k + \alpha_{\lambda_{\xi}} + V})}\end{bmatrix} = {\begin{bmatrix}{X_{({n + \beta_{\lambda_{\xi}}})} + {X_{({n + \beta_{\lambda_{\xi}} + \alpha_{\lambda_{\xi}}})}w_{N}^{\alpha_{\lambda_{\xi}}}}} \\{X_{({n + \beta_{\lambda_{\xi}}})} - {X_{({n + \beta_{\lambda_{\xi}} + \alpha_{\lambda}})}w_{N}^{\alpha_{\lambda_{\xi}}}}}\end{bmatrix}.}} & \left( {{Equation}\mspace{14mu} 35} \right)\end{matrix}$

By regrouping these four cases where each of which will share the samecoefficient multiplier, the following expression may be realized:

$\begin{matrix}{\begin{bmatrix}X_{({k + \alpha_{\lambda}})} \\X_{({k + \alpha_{\lambda} + V})} \\X_{({k + {r^{({s - 2})}\alpha_{\lambda}}})} \\X_{({k + {r^{({s - 2})}\alpha_{\lambda}} + V})} \\X_{({k + {2r^{({s - 2})}\alpha_{\lambda}}})} \\X_{({k + {2r^{({s - 2})}\alpha_{\lambda}} + V})} \\X_{({k + {3r^{({s - 2})}\alpha_{\lambda}}})} \\X_{({k + {3r^{({s - 2})}\alpha_{\lambda}} + V})}\end{bmatrix} = {\quad{\begin{bmatrix}{x_{({n + \beta_{\lambda}})} + {x_{({n + \beta_{\lambda} + \alpha_{\lambda}})}w_{N}^{\alpha_{\lambda}}}} \\{x_{({n + \beta_{\lambda}})} - {x_{({n + \beta_{\lambda} + \alpha_{\lambda}})}w_{N}^{\alpha_{\lambda}}}} \\{x_{({n + {r^{({s - 2})}\beta_{\lambda}}})} + {x_{({n + {r^{({s - 2})}\beta_{\lambda}} + {r^{({s - 2})}\alpha_{\lambda}}})}w_{N}^{{({r^{({s - 2})} + \lambda})}\alpha}}} \\{x_{({n + {r^{({s - 2})}\beta_{\lambda}}})} - {x_{({n + {r^{({s - 2})}\beta_{\lambda}} + {r^{({s - 2})}\alpha_{\lambda}}})}w_{N}^{{({r^{({s - 2})} + \lambda})}\alpha}}} \\{x_{({n + {2r^{({s - 2})}\beta_{\lambda}}})} + {x_{({n + {2r^{({s - 2})}\beta_{\lambda}} + {2r^{({s - 2})}\alpha_{\lambda}}})}w_{N}^{{({{2r^{({s - 2})}} + \lambda})}\alpha}}} \\{x_{({n + {2r^{({s - 2})}\beta_{\lambda}}})} - {x_{({n + {2r^{({s - 2})}\beta_{\lambda}} + {2r^{({s - 2})}\alpha_{\lambda}}})}w_{N}^{{({{2r^{({s - 2})}} + \lambda})}\alpha}}} \\{x_{({n + {3r^{({s - 2})}\beta_{\lambda}}})} + {x_{({n + {3r^{({s - 2})}\beta_{\lambda}} + {3r^{({s - 2})}\alpha_{\lambda}}})}w_{N}^{{({{3r^{({s - 2})}} + \lambda})}\alpha}}} \\{x_{({n + {3r^{({s - 2})}\beta_{\lambda}}})} - {x_{({n + {3r^{({s - 2})}\beta_{\lambda}} + {3r^{({s - 2})}\alpha_{\lambda}}})}w_{N}^{{({{3r^{({s - 2})}} + \lambda})}\alpha}}}\end{bmatrix},}}} & \left( {{Equation}\mspace{14mu} 36} \right)\end{matrix}$

where λ∈1 . . . 2^((s−2))[

. The entity w_(N) ^((2r) ^((s−2)) ^(+λ)α) in the fifth and the sixthterms of Equation 36 can be simplified as follows:

$\begin{matrix}{w_{N}^{{({{2r^{({s - 2})}} + \lambda})}\alpha} = {{w_{N}^{\lambda\alpha}w_{N}^{2r^{({s - 2})}\alpha}} = {{w_{N}^{\alpha_{\lambda}}w_{N}^{\frac{N}{4}}} = {- {jw}_{N}^{\alpha_{\lambda}}}}}} & \left( {{Equation}\mspace{14mu} 37} \right)\end{matrix}$

In this example, the domain for λ for the entities w_(N) ^((r) ^((s−2))^(+λ)α) and w_(N) ^((3r) ^((s−2)) ^(+λ)α) can be defined as follows:

$\begin{matrix}{\mspace{20mu} {\lambda \in {2^{({s - 2})}\mspace{14mu} \cdots \mspace{14mu} {1\left\lbrack {{\text{?}.\text{?}}\text{indicates text missing or illegible when filed}} \right.}}}} & \left( {{Equation}\mspace{14mu} 38} \right)\end{matrix}$

These entities could be expressed, respectively, as follows:

$\begin{matrix}{{w_{N}^{{({r^{({s - 2})} + {({r^{({s - 2})} - \lambda})}})}\alpha} = {w_{N}^{{2r^{({s - 2})}\alpha} - {\lambda\alpha}} = {w_{N}^{- {({\alpha_{\lambda} - \frac{N}{4}})}} = {{conj}\left( {jw}_{N}^{\alpha_{\lambda}} \right)}}}},} & \left( {{Equation}\mspace{14mu} 39} \right) \\{{w_{N}^{{({{3r^{({s - 2})}} + {({r^{({s - 2})} - \lambda})}})}\alpha} = {w_{N}^{{4r^{({s - 2})}\alpha} - {\lambda\alpha}} = {{w_{N}^{- \alpha_{\lambda}}w_{N}^{\frac{N}{2}}} = {- {{conj}\left( {jw}_{N}^{\alpha_{\lambda}} \right)}}}}},} & \left( {{Equation}\mspace{14mu} 40} \right)\end{matrix}$

where the variable conj in Equations 39 and 40 refers to the complexconjugate process. As a result, Equation 36 can be rewritten as follows:

$\begin{matrix}{\begin{bmatrix}X_{({k + \alpha_{\lambda}})} \\X_{({k + \alpha_{\lambda} + V})} \\X_{({k + {r^{({s - 2})}\alpha_{\lambda}}})} \\X_{({k + {r^{({s - 2})}\alpha_{\lambda}} + V})} \\X_{({k + {2r^{({s - 2})}\alpha_{\lambda}}})} \\X_{({k + {2r^{({s - 2})}\alpha_{\lambda}} + V})} \\X_{({k + {3r^{({s - 2})}\alpha_{\lambda}}})} \\X_{({k + {3r^{({s - 2})}\alpha_{\lambda}} + V})}\end{bmatrix} = {\quad\begin{bmatrix}{x_{({n + \beta_{\lambda}})} + {x_{({n + \beta_{\lambda} + \alpha_{\lambda}})}w_{N}^{\alpha_{\lambda}}}} \\{x_{({n + \beta_{\lambda}})} - {x_{({n + \beta_{\lambda} + \alpha_{\lambda}})}w_{N}^{\alpha_{\lambda}}}} \\{x_{({n + {r^{({s - 2})}\beta_{\lambda}}})} + {x_{({n + {2r^{({s - 2})}\beta_{\lambda}} - {{({{2r^{({s - 2})}} - 1})}\alpha_{\lambda}}})} \times {{conj}\left( {jw}_{N}^{\alpha_{\lambda}} \right)}}} \\{x_{({n + {r^{({s - 2})}\beta_{\lambda}}})} - {x_{({n + {2r^{({s - 2})}\beta_{\lambda}} - {{({{2r^{({s - 2})}} - 1})}\alpha_{\lambda}}})} \times {{conj}\left( {jw}_{N}^{\alpha_{\lambda}} \right)}}} \\{x_{({n + {2r^{({s - 2})}\beta_{\lambda}}})} + {{x_{({n + {2r^{({s - 2})}\beta_{\lambda}} + {2r^{({s - 2})}\alpha_{\lambda}}})}\left( {- j} \right)}w_{N}^{\alpha_{\lambda}}}} \\{x_{({n + {2r^{({s - 2})}\beta_{\lambda}}})} - {{x_{({n + {2r^{({s - 2})}\beta_{\lambda}} + {2r^{({s - 2})}\alpha_{\lambda}}})}\left( {- j} \right)}w_{N}^{\alpha_{\lambda}}}} \\{x_{({n + {3r^{({s - 2})}\beta_{\lambda}}})} + {x_{({n + {3r^{({s - 2})}\beta_{\lambda}} - {{({{3r^{({s - 2})}} - 1})}\alpha_{\lambda}}})} \times {{conj}\left( w_{N}^{\alpha_{\lambda}} \right)}}} \\{x_{({n + {3r^{({s - 2})}\beta_{\lambda}}})} - {x_{({n + {3r^{({s - 2})}\beta_{\lambda}} - {{({{3r^{({s - 2})}} - 1})}\alpha_{\lambda}}})} \times {{conj}\left( w_{N}^{\alpha_{\lambda}} \right)}}}\end{bmatrix}}} & \left( {{Equation}\mspace{14mu} 41} \right)\end{matrix}$

From Equation 41, the FFT radix 2³ butterfly can be derived as depictedand described below with respect to FIG. 12.

FIG. 12 depicts a graph of a Radix-2³ FFT butterfly structure 1200 for anon-trivial computation, in accordance with certain embodiments of thepresent disclosure. In this example, one complex coefficient multiplier(or twiddle factor) can be used for each of the eight complex inputs. Inaddition, the coefficient multiplier memory can be accessed once foreach 4×2^(s) word (a set of two inputs) for the DIT process. For the DIFprocess, where s is the actual stage (iteration) of the FFT process andwhere S represents a total number of stages of the FFT process, thecoefficient multiplier memory can be accessed once for every 2^(S−s))word where (S=log₂ (N)−1).

In FIG. 12, the structure 1200 may include a complex multiplier node1201 and can include summing nodes, generally indicated at 1202. Thestructure 1200 may also include a complex multiplier node 1203 andsumming nodes, generally indicated at 1204. Further, the structure 1200can include a complex multiplier node 1205 and summing nodes, generallyindicated at 1206. The structure 1200 can also include a complexmultiplier node 1207 and summing nodes 1208, generally indicated at1208.

Compared to conventional methods that require two memory accesses perfour inputs and one memory access per two inputs, the FFT radix-2³butterfly structure 1200 may use one memory access per eight inputs.Further, the multiplication by

${\pm \frac{\sqrt{2}}{2}} \pm {j\frac{\sqrt{2}}{2}\mspace{14mu} {can}}$

be predicted, where the number of arithmetical operations to completethe complex multiplication can be reduced from six to two as shown inTables 1 and 2 below. Further, the reduction in memory accesses to thecoefficient multiplier's memory is illustrated in Table 3 for differentFFT sizes.

In Tables 1-3, a conventional method #1 (“DIT”) refers to a methoddescribed in Y. Wang and al, “Novel Memory Reference Reduction Methodsfor FFT Implementations on DSP Processors”, IEEE Transactions on signalprocessing, Vol. 55, No. 5, May 2007. Further, a conventional method #2(“TMS”) refers to DIF radix-2 FFT code taken from “TMS320C64x DSPLibrary Programmer's Reference”, Literature Number: SPRU565B, October2003, (code DSP-radix-2, p. 4-9, 4-10).

TABLE 1 Comparison in terms of real multiplication between conventionalmethods versus the Radix-2³ FFT method Multiplication reduction Radix-2³(%) N TMS DIT FFT TMS DIT 8 48 8 4 91.7 50 16 128 40 24 81.25 40 32 320136 88 72.5 35.29 64 768 392 264 65.62 32.65 128 1792 1032 712 60.2631.1 256 4096 2568 1800 56.05 29.90 512 9216 6152 4360 52.69 29.12 102420480 14344 10248 49.96 28.55 2048 45056 32776 23560 47.70 28.11

TABLE 2 Comparison in terms of real addition between the conventionalmethods versus the Radix-2³ FFT method Addition reduction Radix-2³ (%) NTMS DIT FFT TMS DIT 8 72 52 48 33.34 7.6 16 192 148 140 27.08 5.40 32480 392 380 20.83 3.06 64 1152 988 972 15.62 1.6 128 2688 2400 238011.45 0.83 256 6144 5668 5644 8.13 0.77 512 13824 13096 13068 5.46 0.211024 30720 28740 28708 6.54 0.11 2048 67584 66612 66572 1.49 0.06

TABLE 3 Comparison in terms of memory accesses to the coefficientmultiplier in the conventional methods versus the Radix-2³ FFT methodwhere each complex access is counted as 1: Memory accesses Radix-2³reduction (%) N TMS DIT FFT TMS DIT 8 7 1 0 100 100 16 15 5 2 86.7 60 3231 15 8 74.2 46.7 64 63 37 22 65.1 40.5 128 127 83 52 59.1 37.35 256 255177 114 55.3 35.6 512 511 367 240 53.1 34.7 1024 1023 749 494 51.7 34.12048 2047 1515 1004 49.1 33.7

Table 4 reveals simulation results of the conventional methods versusthe Radix-2³ FFT method where the term “Loss” is defined as the ratio ofthe conventional method over the Radix-2³ FFT method.

TABLE 4 Comparative results in term of clock cycle of the conventionalmethods versus the Radix-2³ FFT method for different FFT sizes CycleReductions (%) Length TMS DIT Proposed TMS DIT 64 5252 4210 3648 43.9715.41 128 11363 9048 7612 49.28 18.86 256 24578 19246 15832 55.24 21.56512 53025 40676 32852 61.41 23.82 1024 113984 85594 68048 67.51 25.782048 244063 179536 140748 73.40 27.56 4096 520574 375622 290760 79.0429.19The ratio of the conventional method over the Radix-23 FFT method isdescribed below with respect to FIG. 13.

FIG. 13 depicts a graph 1300 of a percentage reduction of clock cyclesas a function of the FFT length for a TMS clock and a DIT clock, inaccordance with certain embodiments of the present disclosure. Thepercentage reduction in clock cycles appears to increase substantiallylinearly as the FFT length (N) increases for the implementation of theRadix-2³ FFT method as compared to the reference. At a FFT length oflog2(12), the Radix-2³ FFT method provides a 60% rejection in clockcycles as compared to the reference algorithm.

TABLE 5 Comparison of the coefficients multiplier's memory requirementof the conventional methods versus the Radix-2³ FFT method where thesize is computed in term of bytes FFT Length TMS DIT Proposed N 2N N/2 −2 N/8 − 1

As can be seen from Table 5, the method described herein achieves asignificant reduction in the coefficient multiplier's memoryrequirements in terms of bytes. In particular, the method describedherein achieves a memory size reduction of one less than the number ofbytes divided by 8, as compared to the DIT reduction of two less thanhalf of the number of bytes.

FIG. 14 depicts a block diagram of a signal processing system 1400including a Radix-2³ FFT butterfly structure, in accordance with certainembodiments of the present disclosure. The system 1400 may include adigital signal processing (DSP) circuit 1402 having an input coupled toan analog-to-digital converter 1404, which may be configured to providedigital input stream to the DSP circuit 1402. The DSP circuit 1402 mayfurther include an output coupled to a processor core 1406 or to anothercircuit or device. Other embodiments are also possible.

In some embodiments, the DSP circuit 1402 may include a low-pass filter1408 including an input coupled to the output of the ADC 1404 andincluding an output. The DSP circuit 1402 may further include a radix-2³FFT module 1410 including an input coupled to the low pass filter 1408and including an output coupled to the processor cor 1406 through aninput/output (I/O) interface 1412.

In conjunction with the systems, methods, and devices described abovewith respect to FIGS. 1-14 provides an efficient ordered input, orderedoutput radix 2³ algorithm that reduces the complexity and thecomputational effort in comparison to conventional methods. Furthermore,the systems, methods, and devices demonstrate a significant improvementin execution time in term of clock cycles compared to the conventionalmethods. In certain embodiments, the systems, methods, and devices maybe configured to predict the 8th root of unity and to reduce the memorysize needed to stock the coefficient multiplier to N/8. Accordingly,each of these improvements may contribute, individually andcollectively, to an efficiency gain with respect to the processor, whichmay be realized in terms of faster processing, reduced memoryconsumption, reduced power consumption, and other improvements.

Although the present invention has been described with reference topreferred embodiments, workers skilled in the art will recognize thatchanges may be made in form and detail without departing from the scopeof the invention.

What is claimed is:
 1. A circuit comprising: an input configured toreceive a signal; and a radix-2³ fast Fourier transform (FFT) processingelement coupled to the input and configured to control variation oftwiddle factors during calculation of a complete FFT through a pluralityof processing stages of an FFT process, the radix-2³ FFT processingelement configured to incorporate the twiddle factors and adder treematrices of the calculation into a single stage.
 2. The circuit of claim1, wherein data input to the radix-2³ FFT processing element and dataoutput by the radix-2³FFT processing element are in natural order duringeach stage of the plurality of processing stages of the FFT process. 3.The circuit of claim 1, wherein data within the radix-2³ FFT processingelement are grouped with their corresponding coefficients multipliersduring each stage of the plurality of processing stages of the FFTprocess.
 4. The circuit of claim 1, wherein a total number of shiftsduring each stage in the plurality of processing stages of an FFTprocess configured to perform a decimation in time (DIT) process isrepresented as r^(s).
 5. The circuit of claim 1, wherein a total numberof shifts during each stage in the plurality of processing stages of anFFT process configured to perform a decimation in frequency (DIF)process is represented as r^((S−s)).
 6. The circuit of claim 1, whereintrivial multiplication by one operations are avoided during theplurality of processing stages of the FFT process.
 7. A circuitcomprising: an input configured to receive a signal; and a radix-2³ fastFourier transform (FFT) processing element coupled to the input andconfigured to control variation of twiddle factors during calculation ofa complete FFT through one or more stages.
 8. The circuit of claim 7,wherein the radix-2^(3 FT) processing element is configured toincorporate the twiddle factors and adder tree matrices of thecalculation into a single stage.
 9. The circuit of claim 7, wherein datainput to the radix-2³ FFT processing element and data output by theradix-2³ FFT processing element are in natural order during each stageof the one or more stages.
 10. The circuit of claim 7, wherein theradix-2³ FFT processing element is configured to: determine data fromthe signal at the input; group each data element from the determineddata with its corresponding coefficient multiplier to form grouped data;and process the grouped data to produce an output signal.
 11. Thecircuit of claim 7, wherein the radix-2³ FFT processing element isconfigured to perform a decimation in time (DIT) process having a numberof shifts corresponding to a size N of the input data divided by theradix.
 12. The circuit of claim 7, wherein the radix-2³ FFT processingelement is configured to perform a decimation in frequency (DIF) processhaving a number of shifts corresponding to a number of words minus anumber of stages.
 13. The circuit of claim 7, wherein the radix-2³ FFTprocessing element avoids multiplication-by-one operations during theone or more stages of the FFT.
 14. A circuit comprising: an inputconfigured to receive a signal; and a radix-r fast Fourier transform(FFT) processing element coupled to the input, the radix-r FFTprocessing element configured to: receive an input signal having anumber of bits N; reverse a bit order of the bits N; decompose the bitorder into groups of bits based on a base of a radix of the radix-r FFTprocessing element; and process the groups of bits together with theircoefficients to produce an output signal.
 15. The circuit of claim 14,wherein the radix-r FFT processing element is configured to controlvariation of twiddle factors during calculation of an FFT through one ormore stages of an FFT process.
 16. The circuit of claim 14, wherein theradix-r FFT processing element is configured to incorporate the twiddlefactors and adder tree matrices of the calculation into a single stage.17. The circuit of claim 14, wherein data input to the radix-r FFTprocessing element and data output by the radix-r FFT processing elementare in natural order during each stage of the one or more stages. 18.The circuit of claim 14, wherein the radix-r FFT processing element isconfigured to: determine data from the signal at the input; group eachdata element from the determined data with its corresponding coefficientmultiplier to form grouped data; and process the grouped data to producean output signal.
 19. The circuit of claim 14, wherein the radix-r FFTprocessing element is configured to: perform a decimation in time (DIT)process having a number of shifts corresponding to a size N of the inputdata divided by the radix; and perform a decimation in frequency (DIF)process having a number of shifts corresponding to a number of wordsminus a number of stages.
 20. The circuit of claim 14, wherein theradix-r FFT processing element includes a radix-2³ FFT processingelement to avoid multiplication-by-one operations during processingwithin the one or more stages.